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|\Ve develop a method for analyzing the mixing times for a quite general class of 
Markov chains on the complete monomial group G I S n and a quite general class of 
Markov chains on the homogeneous space (G ; S n )/(S r x S n - r ). We derive an exact 
formula for the L 2 distance in terms of the L 2 distances to uniformity for closely 
related random walks on the symmetric groups Sj for 1 < j ' < n or for closely related 
Markov chains on the homogeneous spaces Si+j/(Si x Sj) for various values of i and j, 
respectively. Our results are consistent with those previously known, but our method is 
considerably simpler and more general. 

1. Introduction and Summary. In the proofs of many of the results of Schoolfield 
(1999a), the L 2 dist ance to uniformity for the random walk (on the so-called wreath product 
of a group G with the symmetric group S n ) being analyzed is often found to be expressible in 
terms of the L 2 distance to uniformity for related random walks on the symmetric groups Sj 
with 1 < j < n. Similarly, in the proofs of many of the results of Schoolfield (1999b), the L 2 
distance to stationarity for the Markov chain being analyzed is often found to be expressible 
in terms of the L 2 distance to stationarity of related Markov chains on the homogeneous 
spaces Si + j/ (Si x Sj) for various values of i and j. It is from this observation that the results 
of this paper have evolved. We develop a method, with broad applications, for bounding the 
rate of convergence to stationarity for a general class of random walks and Markov chains 
in terms of closely related chains on the symmetric groups and related homogeneous spaces. 
Certain specialized problems of this sort were previously analyzed with the use of group 
representation theory. Our analysis is more directly probabilistic and yields some insight 
into the basic structure of the random walks and Markov chains being analyzed. 

1.1. Markov Chains on G I S n . We now describe one of the two basic set-ups we will 
be considering [namely, the one corresponding to the results in Schoolfield (1999a)]. Let n be 
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a positive integer and let P be a probability measure denned on a finite set G (= {1, . . . , m}, 
say). Imagine n cards, labeled 1 through n on their fronts, arranged on a table in sequential 
order. Write the number 1 on the back of each card. Now repeatedly permute the cards and 
rewrite the numbers on their backs, as follows. For each independent repetition, begin by 
choosing integers i and j independently according to P. 

If % ^ j, transpose the cards in positions i and j. Then, (probabilistically) independently 
of the choice of i and j, replace the numbers on the backs of the transposed cards with two 
numbers chosen independently from G according to P. 

If % = j (which occurs with probability 1/n), leave all cards in their current positions. 
Then, again independently of the choice of j, replace the number on the back of the card in 
position j by a number chosen according to P. 

Our interest is in bounding the mixing time for Markov chains of the sort we have described. 
More generally, consider any probability measure, say Q, on the set of ordered pairs ft of the 
form 7T = (tt, J), where it is a permutation of {1, . . . , n} and J is a subset of the set of fixed 
points of 7T. At each time step, we choose such a tc according to Q and then (a) permute the 
cards by multiplying the current permutation of front-labels by 7r; and (b) replace the back- 
numbers of all cards whose positions have changed, and also every card whose (necessarily 
unchanged) position belongs to J, by numbers chosen independently according to P. 

The specific transpositions example discussed above fits the more general description, 
taking Q to be defined by 

Q(e, {j}) := — for any j G [n], with e the identity permutation, 

n 2 

2 

Q(t, 0) := — for any transposition r, (I'-U 
n 2 

Q(fc) := otherwise. 

When m — 1, i.e., when the aspect of back- number labeling is ignored, the state space of the 
chain can be identified with the symmetric group S n , and the mixing time can be bounded 
as in the following classical result, which is Theorem 1 of Diaconis and Shahshahani (1981) 
and was later included in Diaconis (1988) as Theorem 5 in Section D of Chapter 3. The total 
variation norm (|| • ||xv) an d the L 2 norm (|| • || 2 ) will be reviewed in Section |1.3| . 

Theorem 1.2. Let v* k denote the distribution at time k for the random transpositions 
chain ( |1 . 1|) when m = 1, and let U be the uniform distribution on S n . Let k = ^nlogn + cn. 
Then there exists a universal constant a > such that 

\W* k -U\\T Y < ±\\v* k -U\\ 2 < ae- 2c for all c> 0. 
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Without reviewing the precise details, we remark that this bound is sharp, in that there 
is a matching lower bound for total variation (and hence also for L 2 ). Thus, roughly put, 
|n log n + cn steps are necessary and sufficient for approximate stationarity. 

Now consider the chain (|1 . 1|) for general m > 2, but restrict attention to the case that P 
is uniform on G. An elementary approach to bounding the mixing time is to combine the 
mixing time result of Theorem |1.2| (which measures how quickly the cards get mixed up) 
with a coupon collector's analysis (which measures how quickly their back-numbers become 
random). This approach is carried out in Theorem 3.6.4 of Schoolfield (1999a), but gives an 
upper bound only on total variation distance. If we are to use the chain's mixing-time analysis 
in conjunction with the powerful comparison technique of Diaconis and Saloff-Coste (1993a, 
1993b) to bound mixing times for other more complicated chains, as is done for example in 
Section 4 of Schoolfield (1999a), we need an upper bound on L 2 distance. 

Such a bound can be obtained using group representation theory. Indeed, the Markov 
chain we have described is a random walk on the complete monomial group Gl S n , which is 
the wreath product of the group G with S n ; see Schoolfield (1999a) for further background 
and discussion. The following result is Theorem 3.1.3 of Schoolfield (1999a). 

Theorem 1.3. Let u* k denote the distribution at time k for the random transpositions 
chain ( |1 . 1| ) when P is uniform on G (with \G\ >2). Let k = ^nlogn + |nlog(|G| — 1) + cn. 
Then there exists a universal constant b > such that 

\\v* k -U\\ T v < ±\\v* k -U\\ 2 < be~ 2c for all c> 0. 

For L 2 distance (but not for TV distance), the presence of the additional term \n log(|G| — 
1) in the mixing-time bound is "real," in that there is a matching lower bound: see the table 
at the end of Section 3.6 in Schoolfield (1999a). 

The group-representation approach becomes substantially more difficult to carry out when 
the card-rearrangement scheme is something other than random transpositions, and pro- 
hibitively so if the resulting step-distribution on S n is not constant on conjugacy classes. 
Moreover, there is no possibility whatsoever of using this approach when P is non-uniform, 
since then we are no longer dealing with random walk on a group. 

In Section [| we provide an L 2 -analysis of our chain for completely general shuffles Q of 



the sort we have described. More specifically, in Theorem |2.3| we derive an exact formula for 
the L 2 distance to stationarity in terms of the L 2 distance for closely related random walks 
on the symmetric groups Sj for 1 < j < n. Subsequent corollaries establish more easily 
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applied results in special cases. In particular, Corollary [2.8| extends Theorem to handle 
non-uniform P. 

Our new method does have its limitations. The back-number randomizations must not 
depend on the current back numbers (but rather chosen afresh from P), and they must be 
independent and identically distributed from card to card. So, for example, we do not know 
how to adapt our method to analyze the "paired-shuffles" random walk of Section 3.7 in 
Schoolfield (1999a). 



1.2. Markov Chains on (G I S n )/(S r x S n - r ). We now turn to our second basic 
set-up [namely, the one corresponding to the results in Schoolfield (1999b)]. Again, let n be 
a positive integer and let P be a probability measure defined on a finite set G = {1, . . . , m). 

Imagine two racks, the first with positions labeled 1 through r and the second with po- 
sitions labeled r + 1 through n. Without loss of generality, we assume that 1 < r < n/2. 
Suppose that there are n balls, labeled with serial numbers 1 through n, each initially placed 
at its corresponding rack position. On each ball is written the number 1, which we shall call 
its G-number. Now repeatedly rearrange the balls and rewrite their G-numbers, as follows. 



Consider any Q as in Section LI . At each time step, choose if from Q and then (a) permute 
the balls by multiplying the current permutation of serial numbers by 7r; (b) independently, 
replace the G-numbers of all balls whose positions have changed as a result of the permuta- 
tion, and also every ball whose (necessarily unchanged) position belongs to J, by numbers 
chosen independently from P; and (c) rearrange the balls on each of the two racks so that 
their serial numbers are in increasing order. 

Notice that steps (a)-(b) are carried out in precisely the same way as steps (a)-(b) in 



Section |T7T| . The state of the system is completely determined, at each step, by the ordered 
n-tuple of G- numbers of the n balls 1,2, ... ,n and the unordered set of serial numbers of 
balls on the first rack. We have thus described a Markov chain on the set of all \G\ n ■ (™) 
ordered pairs of n-tuples of elements of G and r-element subsets of a set with n elements. 
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In our present setting, the transpositions example ( |1.1|) fits the more general description, 
taking Q to be defined by 

Q(k, {j}) ■= 9 u ^ where k G K and j G [n] 

where k G K and i ^ j 





1 




n 2 r\ 


(n — 


r)\ 




2 




n 2 r\ 


in — 


r)\ 




2 




n 2 r\ 


(n — 


r)\ 



Q(K,{i,j}) ■ 



Q(tk, 0) := „ . tt where tk G TK, 



;i-4) 



Q(tt) := otherwise, 

where K := S r x S n _ r , T is the set of all transpositions in S n \ K, and TK := {tk G 
S n : r G T and /t G -ft"}. When m = 1, the state space of the chain can be identified 
with the homogeneous space S n /(S r x 5 n _ r ). The chain is then a variant of the celebrated 
Bernoulli-Laplace diffusion model. For the classical model, Diaconis and Shahshahani (1987) 
determined the mixing time. Similarly, Schoolfield (1999b) determined the mixing time of the 

2 

present variant, which slows down the classical chain by a factor of 2r (n-r) n °t forcing two 
balls to switch racks at each step. The following result is Theorem 2.5.3 of Schoolfield (1999b). 

THEOREM 1.5. Let u* k denote the distribution at time k for the variant ( |1.4| ) of the 
Bernoulli-Laplace model when m — 1, and let U be the uniform distribution on S n / (S r x S n - r ) . 
Let k = ^n(logn + c). Then there exists a universal constant a > such that 

||^-^||tv < IW^-Uy < ae~ 2c for all c> 0. 
Again there are matching lower bounds, for r not too far from n/2, so this Markov chain 



is twice as fast to converge as the random walk of Theorem |LT2 . 

The following analogue, for the special case m — 2, of Theorem [L3| in the present setting 
was obtained as Theorem 3.1.3 of Schoolfield (1999b). 

THEOREM 1.6. Let u* h denote the distribution at time k for the variant ( |1.4|) of the 
Bernoulli-Laplace model when P is uniform on G with \G\ = 2. Let k = ^n(\ogn + c). Then 
there exists a universal constant b > such that 

\\v* k -U\\TV < l\\v* k -U\\2 < be' c/2 for all c> 0. 



Notice that Theorem |1.6j provides (essentially) the same mixing time bound as that found 
in Theorem L5. Again there are matching lower bounds, for r not too far from n/2, so this 
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Markov chain is twice as fast to converge as the random walk of Theorem |1.3| in the special 
case m = 2. 

In Section [| we provide a general L 2 -analysis of our chain, which has state space equal 
to the homogeneous space (G I S n )/(S r x S n ~ r ). More specifically, in Theorem |373| we 
derive an exact formula for the L 2 distance to stationarity in terms of the L 2 distance for 
closely related Markov chains on the homogeneous spaces Si+j/(Si x Sj) for various values 
of i and j. Subsequent corollaries establish more easily applied results in special cases. In 



particular, Corollary |3.8| extends Theorem |L6| to handle non-uniform P. 

Again, our method does have its limitations. For example, we do not know how to adapt 
our method to analyze the "paired-flips" Markov chain of Section 3.4 in Schoolfield (1999b). 

1.3. Distances Between Probability Measures. We now review several ways of mea- 
suring distances between probability measures on a finite set G. Let R be a fixed reference 
probability measure on G with R(g) > for all g G G. As discussed in Aldous and Fill (200x), 
for each 1 < p < oo define the L p norm \\v\\ p of any signed measure v on G (with respect 
to R) by 



E 



R 



P\ 1/p 



E 



Hg)\ p 

R(g)P-i 



1/p 



Thus the L p distance between any two probability measures P and Q on G (with respect 
to R) is 



\p-QWv 



E 



R 



P-Q 



R 



p\ i/p 



E 



\P(g)-Q(g) \ p 

R(g)P-i 



i/p 



Notice that 



11^ -Glli = Y,\ p (9)-Q(9)\- 

g eG 

In our applications we will always take Q = R (and R will always be the stationary distri- 
bution of the Markov chain under consideration at that time). In that case, when U is the 
uniform distribution on G, 

. 1/2 

\\P-U\\ 2 = [\G\J2\ P (9) ~ U(g)\^ 
\ geG 

The total variation distance between P and Q is defined by 



\p-Q\Wv 



max\P(A)-Q(A)\. 

ACG 
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Notice that \\P — Q\\tv — fll-P — Q\\i- It is a direct consequence of the Cauchy-Schwarz 
inequality that 

\\P-U\\ TY < \ \\P-U\\ 2 . 

If P(-, •) is a reversible transition matrix on G with stationary distribution R = P°°(-), 
then, for any g G G, 

\\V> k (n \ p°°nn 2 - p2fc (do, 9o) , 

HP (Sb.0-P (0112 - poo (flb) - !" 

All of the distances we have discussed here are indeed metrics on the space of probability 
measures on G. 

2. Markov Chains on G I S n . We now analyze a very general Markov chain on the 
complete monomial group G I S n . It should be noted that, in the results which follow, there 
is no essential use of the group structure of G. So the results of this section extend simply; 
in general, the Markov chain of interest is on the set G n x S n . 

2.1. A Class of Chains on G I S n . We introduce a generalization of permutations 
tt G S n which will provide an extra level of generality in the results that follow. Recall that 
any permutation tt G S n can be written as the product of disjoint cyclic factors, say 

* = (4"4" -4 1 , ) )(4 2) 4 2) •••<?)- «?»4P -i?), 

where the K :— ki H — • + fc^ numbers are distinct elements from [n] := {1,2, ... ,n} and 
we may suppose k a > 2 for 1 < a < t. The n — K elements of [n] not included among the 
1^ are each fixed by 71; we denote this (n — K)-set by F(ir). 

We refer to the ordered pair of a permutation ir £ S n and a subset J of F(7r) as an 
augmented permutation. We denote the set of all such ordered pairs n = (ir, J), with n e ^ 
and J C F(?r), by S n . For example, vr G S w given by tt = ((12)(34)(567), {8, 10}) is 
the augmentation of the permutation tt = (12) (34) (567) G S10 by the subset {8,10} 
of F(ir) = {8,9,10}. Notice that any given 7r G S n corresponds to a unique permutation 
7r G S n ; denote the mapping tt 1— > 7r by T. For 7r = (7r, J) G define 7(7r) to be the set of 
indices i included in tt, in the sense that either i is not a fixed point of tt or i G J; for our 
example, I(tt) = {1,2,3,4,5,6,7,8,10}. 

Let Q be a probability measure on S n such that 

Q(tt, J) = ^(Tr- 1 , J) for all tt E S n and J C F(tt) = ^(Tr- 1 ). (2.0) 
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We refer to this property as augmented symmetry. This terminology is (in part) justified by 
the fact that if Q is augmented symmetric, then the measure Q on S n induced by T is given 
by 



Q(tt) = Q ((*", J)) = Qi^ 1 ) for each tt G S n 

JCF(n) 

and so is symmetric in the usual sense. We assume that Q is not concentrated on a subgroup 
of G or a coset thereof. Thus Q* k approaches the uniform distribution U on S n for large k. 

Suppose that G is a finite group. Label the elements of G as <7i,#2, . . . ,9\g\- Let -P be a 
probability measure defined on G. Define pi := P{gi) for 1 < % < \G\. To avoid trivialities, 
we suppose p m i n := min {pi : 1 < % < \G\\ > 0. 

Let ^1,^2; ••• be a sequence of independent augmented permutations each distributed ac- 
cording to Q. These correspond uniquely to a sequence £1,^2, • • • of permutations each dis- 
tributed according to Q. Define Y := (Y ,Yi,Y2, . . .) to be the random walk on S n with 
Y := e and Y^ := CfcCfe-i ' ' ' Ci f° r a ll k > 1. (There is no loss of generality in defining 
Y := e, as any other tt G S n can be transformed to the identity by a permutation of the 
labels.) 

Define X := (Xq, X±, X2, ■ ■ •) to be the Markov chain on G n such that Xo := Xq = 
(Xi)---jXn) with %j G G for 1 < i < n and, at each step k for k > 1, the entries of 
X^_i whose positions are included in are independently changed to an element of G 

distributed according to P. 

Define W := (W , Wi, W 2 , ■ ■ •) to be the Markov chain on G I S n such that W k := (X k ; Y k ) 
for all k > 0. Notice that the random walk on G I S n analyzed in Theorem |1.3| is a special 



case of W, with P being the uniform distribution and Q being defined as at (|1 . 1| ) . Let P( 
be the transition matrix for W and let P°°(-) be the stationary distribution for W. 
Notice that 



1 

poo (^) = ^n^ 

8=1 



for any (x; tt) G G I S n and that 



E 





n % 




j [ l(^ = y/ 


1 
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for any (x; 7r), (y; cr) G G I S n . Thus, using the augmented symmetry of Q, 
P°°(x;n)P((x;7r),(y;<j)) 



n! - LJ - 



E 

peS n :T(p)=crir-i 



Q(p) 


n ^ 




i 







E 

peSn-.Tip^vir- 1 

E 

/3e5„:T(p)=7r ( T- 1 



in 



n! 
1 

n! 



,ie/(p) 



n ^» 



n p** i ( n ^ 



n p»i 

jei(p) 



n ^ 

je/(p) 



JJ I(xt = ye) 



mm 



l n 



E w) 


n px * 




Y[ I(yi = x e ) 


pG5„:T( / 3)=7r CT - 1 


jei(p) 







J'=l 

= P°° (y ;( 7)P((y ;f 7),(f;7r)). 

Therefore, P is reversible, which is a necessary condition in order to apply the comparison 
technique of Diaconis and Saloff-Coste (1993a). 

2.2. Convergence to Stationarity: Main Result. For notational purposes, let 

fin(J) := Q{aeS n :I(a)CJ}. (2.1) 

For any J C [n], let S'(j) be the subgroup of S n consisting of those a G S n with [n] \F(a) C J. 
If 7r G is random with distribution Q, then, when the conditioning event 

E:={/(vr)C J}[ = {[n]\F(T(7r))C J} 

has positive probability, the probability measure induced by T from the conditional distri- 
bution (call it Qs (J) ) of given E is concentrated on S(jy Call this induced measure Qs (jy 
Notice that Qs (J) , like Q, is augmented symmetric and hence that Qs {J) is symmetric on S(j). 
Let ^s ( n be the uniform measure on S(jy For notational purposes, let 



d k (j) := \mo? -U Sl 



, > US" 



(2.2) 
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Example. Let Q be defined as at (|1 . 1| ) . Then Q satisfies the augmented symmetry property 
( [2. 0| ). In Corollary [TH] we will be using Q to define a random walk on G I S n which is precisely 
the random walk analyzed in Theorem |Q. 



For now, however, we will be satisfied to determine Q57,, and Qs, n , where J C \n\. It is 
easy to verify that 

Qs (J) {e, {j}) : = ijp for each j G J, 

Qs {J) ((p <?), 0) : = 771^ for each transposition r E S n with {p, q] C J, 

Q 5(J) (7r):=0 otherwise, 

and hence that Qs {J) is the probability measure defined at ( |1 . 1|) , but with [n] changed to 
J. Thus, roughly put, the random walk analyzed in Theorem |1.3| , conditionally restricted to 
the indices in J, gives a random walk "as if J were the only indices." 

The following result establishes an upper bound on the total variation distance by deriving 
an exact formula for ||P fc ((f ,e), •) - P°°(-)lli- 

THEOREM 2.3. Let W be the Markov chain on the complete monomial group G I S n 
defined in Section ^TT| . Then 

\\P k ((x Q ;e),-)-P°°(-) ||2 V <I ||P fe ((f ,e),. 



p~(.)||2 



J:JC\n] 



\J\\ 



n 



1 

PXi 



/in(J) ife d k (J) 



+ i V — 

4 ^ IJI! 

J:JCfnl 1 1 



/in(J) 2fe - 



where /i n (J) and <ifc(J) are defined at ( pTT| ) and ( 12.21 ), respectively. 



Before proceeding to the proof, we note the following. In the present setting, the argument 
used to prove Theorem 3.6.4 of Schoolfield (1999a) gives the upper bound 

||P fc ((f ;e),-)-P oo (-)llTV < \\Q* k ~ Us n \\Tv + ¥ (T > k) , 

where T := inf {k > 1 : = [n]} and is defined as at the outset of that theorem's proof. 
Theorem [2.3| provides a similar type of upper bound, but (a) we work with L 2 distance 
instead of total variation distance and (b) the analysis is more intricate, involving the need 
to consider how many steps are needed to escape sets J of positions and also the need to 
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know L 2 for random walks on subsets of [n]. However, Theorem [2.3| does derive an exact 
formula for L 2 . 

k 

Proof. For each k > 1, let H k := \J I(£ e ) C [ n)] so iffc is the (random) set of indices 

1=1 

included in at least one of the augmented permutations £i, . . . , For any given u> = (x\ n) G 
G I S n , let A C. [n] be the set of indices such that Xi ^ Xh where Xi is the ith entry of x and 
Xi is the ith entry of x , and let B = [n] \ F(tt) be the set of indices deranged by ir. Notice 
that H k D A U B. Then 

F(W k = (x;7r))= £ P(if fc = C7,W fc = (£;7r)) 

C:iUBCCC[n] 

= £ P(# fc = C,r fc = 7r)-P(X fc = £ | fl* = C0 

C:AuBccc[n] 

= £ p(# fc = c,n = 7r) n^- 

For any JC [n], we have P C J, V fc = 7r) = unless B C J C [n], in which case 

p (ff fc c J, y fc = tt) = p c J) p (y fc = 7T | H k c J) 

= (q{«t G Sn : /(a) C J}) * P (F fc = tt | C J) 

= fi n (J) k F(Y k = n\ H k CJ). 
Then, by Mobius inversion [see, e.g., Stanley (1986), Section 3.7], for any C C [n] we have 
P(i/ fc = C,n = 7r)= (-lf HJl W(H k CJ,Y k = n) 



j-.jcc 



(-1) |CH J| fi n (J) h F (Y k = n \ H k C J) . 



J-.BCJCC 

Combining these results gives 



F(W k = &ir))= Yl (-l) |Ch|J| Hn(J) k W(Y k = n\H k C J) ]Jp Xi 

C:AUBCCC[n] J-.BCJCC i£C 

= E ('if 1 UJ) k V (Y k = n \ H k C J) Tli-P^)- 

J:BCJC[n] C:AUJCCC[n] i&C 
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But for any DC [n], we have 



e n<-p. 

C:DCCC[n] ieC 



n< 



e ri(-f..) 

E:EC[n]\D i&E 

n (!-^) 

iG[n]\D 



= 1[[1-Mi)-P* i ] 

i€[n] 



where (as usual) I D (i) — 1 ii i e D and = if % £ D. Therefore 



P(W fc = (f;7r)) = ^ (-l) |J| ^(J) fc P(n = 7r | iJ fe C J) J] t 1 ~ W« ~ P*J 

J:BCJC[n] i=l 



In particular, when (x; 7r) = (xo; e), we have A = = B and 



P(W fc = (f ;e))= ]T (-1) |J| fi n (J) k p (y fc = e I ff fc C J) JJ [1 - Ij(i) - 

J:JC[n] 1=1 



1=1 



^ /i„(J) fc P(n = e|i/ fe C J)J](^--l). 

J:JC[n] 



Notice that {# fc C J} = f] C j| for any k and J. So £ ((F , >i, • • • , Y k \ H k C J)) 

£=1 

is the law of a random walk on S n (through step k) with step distribution Qs (jy Thus, using 
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the reversibility of P and the symmetry of Q$, 



|P"(0?o,e), 



P°°(-)lll 



P ife ((f ;e),(f ;e)) - 1 



n 



n 



E 

J:JC[n] 

' E 

J:JC[n] 

E 

J:JC[n] 



l^n(J) P (*2fc = e | i?2fc C J) - 1 



n 



UJ) 2k [ WQt 



|J|! 



^(J) 2 ^(4(J) + i) - i 



|J|! 

J:JCfnl 1 1 



/i„(J) 2fc 4(J) 



+ E 

J:JC[n] 



n 



2A- 



from which the desired result follows. □ 



- 1 



2.3. Corollaries. We now establish several corollaries to our main result. 



Corollary 2.4. Let W be the Markov chain on the complete monomial group G I S n 
as in Theorem ^]3|. For < j < n, let 

M n (j) := max {/i n (J) : \ J\ = j} and D k (j) : = max{4(J) : \J\ = j} . 

Also let 

B(n, k) := max {Dk(j) : < j < n} = max {dk(J) : JC[n]}. 

Then 

||P fc ((fo;e) ) -)-P 00 (-)||^ v <l l|P fc ((^o,e),- 



P°°(0lla 



2fc 



n-1 



+ jE 



i=o 
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Proof. Notice that 



< [-^-1 

Pmin 



n— I J\ 



The result then follows readily from Theorem |2.3| . □ 



Corollary 2.5. In addition to the assumptions of Theorem |2.3| and Corollary |2.4) ; 
suppose that there exists m > suc/i i/iai M n (j) < (j/n) m for all < j < n. Let k > 
—n log n + 7r-n log ( — 1 ) + — cn. Then 

HP^^oJ^.O-P^COIItv < | ||P fc ((fo,e),-)-P°°(-)ll2 < (B(n,k) + e~ 2c ) 1/2 . 
Proof. It follows from Corollary ^ that 

||P fc ((a? ;e),-)-P oo (-)llTV < 2 ||P fc ((x ,e),-)-P oo (-)ll2 



3=0 



II \ III / x \n-3 I J 

J J J- \ Vrnin J \n 



2km 



(2.6) 



ft— 1 / \ I • / ■ \ 2km 

i ( n \ n - ( i . _ A 3 ( L 

n 



3=0 



4 \ j J j\ \Pmin 



If we let i = n — j, then the upper bound becomes 

||P*((^o;e) I .)-P oo (0llTV < |||P fe ((^o,e),-)- 

+ iy-AA n! ^ _ i y (l _ i) 
4Z^ I j (n-i)\ V p «- / 1 n ' 



P°°(.)\\l 



2km 



2km 



n 1 . ra 1 

i=0 ' i=l 

Notice that if k > —n log n + ^-n log ( — 1 ) + — cn, then 

— m & 2m & \Pmin / m ' 



-2ikm/n 



-2ikm/n 



< 



-2c 



-J- - 1 ) n 2 

Pmin 
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from which it follows that 

300 / \ 11 2 



\P k ((f ; e), •) - P°° (•) \\ly < \ \\P k ((*„, e), •) - P°°' ^ " 2 



n 1 1 



«=0 i=l 



<j5(n, fc) exp (e- 2c ) + ±e" 2c exp (e" 2c ) . 
Since c > 0, we have exp (e~ 2c ) < e. Therefore 

\\P k ((x ;e),.)-P°°(.)\\ 2 TY < l\\P k ((x ,e),.)-P°°(.)g < B (n, k) + e~ 2c , 

from which the desired result follows. □ 



Corollary 2.7. In addition to the assumptions of Theorem |2.3| and Corollary \2.4\ , 
suppose that a set with the distribution of 1(a) when a has distribution Q can be constructed 
by first choosing a set size < £ < n according to a probability mass function /„(•) and 
then choosing a set L with \L\ = I uniformly among all such choices. Let k > nlogn + 
~n log ( — 1)+ cn. Then 

£ \ Pmin J 

||P fc ((f ;e),-)-P oo (-)llTV < \ ||P fc ((fo,e),-)-P°°(-)ll2 < {B(n,k) + e~ 2c ) 1/2 . 
Proof. We apply Corollary ^]5[ Notice that 
Q{a E S n : 1(a) = L} = 



fn(£)/G) i£\L\=t, 

otherwise. 



Then, for any J C [n] with \J\ = j, 

M n (j)=Q{aeS n :I(a)CJ} = ^^Q{a G S n : 1(a) = L} 



LCJ 



i^ 1 * -E/»w ^ 

1=1 \n) i=\ 



The result thus follows from Corollary |2.5| , with m = 1 . □ 



Theorem pT3| , and its subsequent corollaries, can be used to bound the distance to station- 
arity of many different Markov chains W on G I S n for which bounds on the L 2 distance 



to uniformity for the related random walks on Sj for 1 < j < n are known. Theorem |L2 
provides such bounds for random walks generated by random transpositions, showing that 
\j log j steps are sufficient. Roussel (1999) has studied random walks on S n generated by 
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permutations with n — m fixed points for m = 3,4, 5, and 6. She has shown that ^nlogn 
steps are both necessary and sufficient. 



Using Theorem |1.2| , the following result establishes an upper bound on both the total 
variation distance and ||P fc ((xq, e), •) — P°°(-)|| 2 in the special case when Q is defined by (|1.1|). 
Analogous results could be established using bounds for random walks generated by random 
m-cycles. When P is the uniform distribution on G, the result reduces to Theorem |1.3|. 



Corollary 2.8. Let W be the Markov chain on the complete monomial group G I S n 



as in Theorem [2.3| , where Q is the probability measure on S n defined at (|1 . 1| ) . Let k 



|nlogn + \n\og lj + \cn. Then there exists a universal constant b > such that 

\\P k ((x 0] e),-)-P oo (-)\\ TY < \ ||P fc ((f ,e),-)-P oo (-)ll2 < be~ c for all c> 0. 
Proof. Let Q be defined by ( |1 . 1|) . For any set J with \ J\ = j, it is clear that we have 
Mn(J) = (j/n) 2 and d k (J) = j\\\Qf.-U Sj \\l 
where Qs^ is the measure on Sj induced by ( |1 . 1|) and Us is the uniform distribution on Sj. 



It then follows from Theorem |1.2| that there exists a universal constant a > such that 
Dk(j) < 4a 2 e _2c for each 1 < j < n, when k > \j logj + ^cj. Since n > j and p m ; n < 1/2, 
this is also true when k = \n\ogn + ^nlog ( — 1 ) + \cn. 

It then follows from Corollary [2.5| , with m = 2, that 



|P fc ((fo; e), •) - P°° (•) ||^v < I IIP" ((2o, e), •) " P 00 ^ 5 " 2 



2 

2c 



<4a 2 e" 2c + e" 2c = (4a 2 + 1) e 

from which the desired result follows. □ 

Corollary [T8] shows that k = |nlogn + ^nlog lj + \cn steps are sufficient for 

the L 2 distance, and hence also the total variation distance, to become small. A lower 
bound in the L 2 distance can also be derived by examining n 2 1 j (l — ^) 4fc , which 



is the contribution, when j = n — 1 and m = 2, to the second summation of ( |2.6| ) from 
the proof of Corollary [2.5| . In the present context, the second summation of ( |2.6| ) is the 
second summation in the statement of Theorem [2.3| with fi n (J) = (| J\/n) 2 . Notice that 
k = \n\ogn+ Tnlog ( — 1 ) — \cn steps are necessary for just this term to become small. 

* 4 \ Pmin / 

3. Markov Chains on (G I S n )/(S r x S n - r ). We now analyze a very general Markov 
chain on the homogeneous space (G I S n )/(S r x S n - r ). It should be noted that, in the results 
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which follow, there is no essential use of the group structure on G. So the results of this section 
extend simply; in general, the Markov chain of interest is on the set G n x (S n /(S r x S n - r ))- 

3.1. A Class of Chains on (G I S n )/(S r x S n - r ). Let [n] := {1,2, ... ,n} and let 

[r] := {1,2, ... ,r} where 1 < r < n/2. Recall that the homog eneous space X — S n /(S r x 
5* n _ r ) can be identified with the set of all (™) subsets of size r from [n]. Suppose that 
x = %2i ■ ■ ■ , i r } Q [n] is such a subset and that [n] \ x = {j r+ %, j r+2 , ■ ■ ■ , j n }- Let 
{i(ij, i( 2 ), • • • , i(k)} Q x and {j( r +i), j(r+2), • • • ,j(r+k)} Q [n]\ x be the sets with all indices, 
listed in increasing order, such that r + 1 < < n and 1 < jm < r for 1 < I < k; in 
the Bernoulli-Laplace framework, these are the labels of the balls that are no longer in their 
respective initial racks. (Notice that if all the balls are on their initial racks, then both of 
these sets are empty.) To each element x G X, we can thus correspond a unique permutation 

(j(r+l) i(l))(j(r+2) i{2)) ' " " (j(r+k) 

in S n , which is the product of k (disjoint) transpositions; when this permutation serves 
to represent an element of the homogeneous space X, we denote it by ff. For example, if 
x = {2, 4, 8} G X = S$/(S 3 x S s ), then tx = (1 4) (3 8). (If all of the balls are on their initial 
racks, then tt = e.) Notice that any given 7r G S n corresponds to a unique n G X; denote the 
mapping n i— >■ n by R. For example, let 7r be the permutation that sends (1, 2, 3, 4, 5, 6, 7, 8) 
to (8, 2, 4, 6, 7, 1, 5, 3); then x = {8, 2, 4} = {2, 4, 8} and tt = R(n) = (1 4) (3 8). 

We now modify the concept of augmented permutation introduced in Section pTT] . Rather 
than the ordered pair of a permutation tt G S n and a subset J of F(tt), we now take an 
augmented permutation to be the ordered pair of a permutation n G S n and a subset J 
of F(R(tt)). In the above example, F(R(tt)) = F(tt) = {2,5,6,7} . The necessity of this 
subtle difference will become apparent when defining Q. For ft = (tt, J) G S n (defined in 
Section [2~T1) , define 

I(tt) := I(R(tt),J) = I(R(T(tt)),J). 

Thus I(tt) is the union of the set of indices deranged by R(T(tt)) and the subset J of the 
fixed points of R(T(tt)). 

Let Q be a probability measure on the augmented permutations S n satisfying the aug- 



mented symmetry property (|2.0| ). Let Q be as described in Section 2.1 . 

Let ^1,^2; ••• be a sequence of independent augmented permutations each distributed ac- 
cording to Q. These correspond uniquely to a sequence £1,^2, • • • of permutations each dis- 
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tributed according to Q. Define Y := (Y , Yi, Y 2 , . . .) to be the Markov chain on S n / (S r x S n _ r ) 
such that Yq := e and Yk := R (^Yk-i) for all k > 1. 

Let P be a probability measure defined on a finite group G and let Pi for 1 < % < \G\ 
and p min > be defined as in Section Define X := (X , X^, Jf 2 , . . .) to be the Markov 
chain on G n such that X := x = (xi, • • • , Xn) with ^ G G for 1 < i < n and, at each 
step k for k > 1, the entries of X^-i whose positions are included in are independently 
changed to an element of G distributed according to P. 

Define W := (W , W u W 2 , . . .) to be the Markov chain on (G 2 S n )/(S r x S n _ r ) such 
that VFfc := Y&) for all > 0. Notice that the signed generalization of the classical 



Bernoulli-Laplace diffusion model analyzed in Theorem |1.6| is a special case of W, with P 
being the uniform distribution on Z 2 and Q being defined as at (|1.4j) . 

Let P(-, •) be the transition matrix for W and let P°°(-) be the stationary distribution for 
W. Notice that n 

p°°(^) = T^n^ 

\r) i=l 

for any (x; ft) G (G I S n )/ (S r x S n - r ) and that 
P ((*;#),(#*)) = E 

p£S n :R(T(p 

for any (af; ft), (y; a) G (G ? S n )/(S r x S n - r ). Thus, using the augmented symmetry of Q, 
P°° (x;7f)P((f;7f),(f;^)) 



W) 


n 




n = 


5" 


_je/(p) 







1 n 



E W) 

peS n :R(T(p)Tr)=a 



E w) 

pe§ n :R(T(p)ir)=d- 

E §os) 

peS n :R(T(p)a)=H: 



n ■ 



1 n 

"prllPw 



E 

peS n :R(T(p)a)=7v 



n Pw 

_j6/(p) 




n ^ 


j 


n p» 


. 


n p^ 

*e/(p) 





Y[ I(a* = ye) 



n ^ 



n 



n 



n i & 



ye) 



Y[ I(yt = xt) 



P°° (y;a)P((y;a),{x;ft)). 
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Therefore, P is reversible, which is a necessary condition in order to apply the comparison 
technique of Diaconis and Saloff-Coste (1993b). 

3.2. Convergence to Stationarity: Main Result. For any JC [n], let be the 
homogeneous space (<S(jn[rD x ^(Jn([n]\[r]))) , where S'(j') is the subgroup of S n consisting 



of those cr e S n with [n] \ F(a) C J'. As in Section |3J] , let Q be a probability measure on 



the augmented permutations £>„ satisfying the augmented symmetry property ( j2J3|) . 



Let Q and Qs {J) be as described in Sections £3] and |2T2| . For notational purposes, let 



/2 n (J) := Q{a6S n :/(a)CJ}. (3.1) 

Let Qx(') be the probability measure on induced (as described in Section 2.2 of 
Schoolfield (1999b)) by Qs (jy Also let U X (J) be the uniform measure on X^ J \ For notational 
purposes, let 

dk(J) := (Jnrlii)!!^^)-^)!^ (3-2) 



Example. Let Q be defined as at (|1.4j) . Then Q satisfies the augmented symmetry property 
( [2.0|) . In the Bernoulli-Laplace framework, the elements Q(k } {j}) and Q(k } {i, j}) leave the 
balls on their current racks, but single out one or two of them, respectively; the element 
Q(tk, 0) switches two balls between the racks. In Corollary |3.8| we will be using Q to define 
a Markov chain on (G I S n )/(S r x S n _ r ) which is a generalization of the Markov chain 
analyzed in Theorem |1.6| . 

It is also easy to verify that Qs (J) is the probability measure defined at ( p..4[ ), but with 
[r] and [n] \ [r] changed to J fl [r] and J fl ([n] \ [r]), respectively. Thus, roughly put, our 
generalization of the Markov chain analyzed in Theorem [O], conditionally restricted to the 
indices in J, gives a Markov chain on (G I S(j)) / [S(jn[r]) x ^(Jn([n]\[r]))) " as if J were the 
only indices." 

The following result establishes an upper bound on the total variation distance by deriving 
an exact formula for ||P fc ((x ; e), •) - P°°(-)||2- 
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Theorem 3.3. Let W be the Markov chain on the homogeneous space 



(G I S n )/(S r x S n - r ) defined in Section |3TT|. Then 



|P fc ((£ ;e»-P c 



* v <± \\P k ((x ;e),-)-P°°(.)\\l 







nl VlJnHi; 



J:JC[n] \\Jn[ 



+ 1 £ 



n 







J:JC[n] \\Jn[r]\) 



2/v 



where /i n (J) and d k (J) are defined at ( |3TT| ) and ( |3.2| ) ; respectively. 



PROOF. For each fc > 1, let if^ := | s J/(^) C [n]. For any given w = (x;tt) G 

(G I S n )/(S r x S n - r ), let A C [n] be the set of indices such that Xi ^ Xh where Xi is 
the ith entry of x and \% i s the zth entry of xq, and let B — [n] \ F(jr) be the set of indices 
deranged by tt. Notice that H k D AU B. 



The proof continues exactly as in the proof of Theorem 2.3 to determine that 



P(W fc = (£;#)) = (-l)' J| /^n(J) fc P(n = ^ I H k CJ)Y[[l-l Av j(i)-p Xi 

J:BCJC[n] i=l 

In particular, when (x; tt) = (x ; e), we have A = = B and 



P(W fe = (£b;e))= E (-l) |J| /in(J) fe P(n = e|^C ^n^-^W-PxJ 

J:JC[n] 



1=1 



8=1 



£ /i n (J) fc P(n = e|^C j)J](jL_i). 

J:JC[n] i£J 



Notice that {F fe C J} = p| C J j for any A; and J. So £ ((Y , *i, . . . , Y k \ H k C J)) is 

£=i 

the law of a Markov chain on S n /(S r x ifj n _ r ) (through step &;) with step distribution Q X (j). 
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where the last sum must be modified to exclude the term for i = r and j = n — r. 
Proof. The proof is analogous to that of Corollary [2.4| . □ 



COROLLARY 3.5. In addition to the assumptions of Theorem |3.3| and Corollary |3.4| ; 
suppose that there exists m > such that M n (J) < (j/n) m for all < j < n. Let k > 



2m ,c , ,w h „ uw,^ , 1 1 + c). Then 



l -n (logn + log(^±- 

|P fc ((fo;g),-),P°°(-)llTv < \ ||P fc ((x ,e),-)-P oo (-)ll2 < 2(5(n,k) + e 
Proof. It follows from Corollary EO that 



1/2 



|P fc ((fo;g)r)-P°°(-)llTv < | \\P k ((xo;e),-)-P°°(-)\\l 



< 4 



J / f^) ^ Pmin J \ n J (3.6) 



where the last sum must be modified to exclude the term for i — r and j = n — r. Notice 
that 



r\ In — r 



/ " \fn-(i + j 



ij \ j J V + 3/ \ r — * 

Thus if we put j' = i + j and change the order of summation we have (enacting now the 
required modification) 

||P fc ((x ;e»,P~(-)|||v < f ||P^o;e),-)-P°°(-)i^ 

< i5(»,*)eC) (i-0 © e ::■ 

i=0 VJ/ v 7 i=£ V (r-(n-j)) V 7 

n-1 / N / • \ 2km rA(j-£) , 

n \ ( 1 \ J ( 3 \ ( n ~ 3 



+ 4 ^ \ 7 / v Pmin 7 V n J ^ V /' - / 

j=0 yJ 7 v 7 i=*V(r-(n-j)) V 
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rA(j-e) 



Of course % — n — j, the upper bound becomes 

i=£V(r-(n-j)) 



|P fc ((f ;e) 



poo / 



< i l|P fc ((^o;e~) 



TV 



p~(-)||| 



i=0 v 7 
n / \ 

+ iE<:)(ib-0( 1 -a"" 

1=1 v 7 

<lB(n J ^^(2 n f( 5 L-l) ! e-^ + i £ (2n) * _ x 

i=0 ' i=l 

Notice that if > ^logn + log lj + cj, then 



-2ikm/n 



-2ikm/n 



< 



L V Pi 



-In 



from which it follows that 

||P fc ((x ; g), •) - P°° (•) ||| v < J ||P fe ((f ; e), ■) - P c v - , ll2 



n 1 i 

< I £i(2e-f + i^i(2e-)' 



i=0 



i=l 



Z! 



<j5(n, jfe) exp (2e~ c ) + \e~ c exp (2e" c ) . 
Since c > 0, we have exp (2e _c ) < e 2 . Therefore 

||P fc ((f ;e~),-)-P oo (-)llTv < | ||P*((*o;S)>0-P°°(0lli! < 4(s(n,fc)+e- 
from which the desired result follows. □ 



Corollary 3.7. In addition to the assumptions of Theorem |3.3| and Corollary [3.4| , sup- 
pose that a set with the distribution of 1(a) when a has distribution Q can be constructed by first 
choosing a set size < I < n according to a probability mass function f n (-) and then choosing a 
set L with \L\ — £ uniformly among all such choices. Let k>\n\ logn + log ( — 1 ] + c 



Then 

||P fc ((x ;e), 



tv < \ ||P fc ((xo,g), 



2 < 2 [B(n,k) + e 



1/2 
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Proof. The proof is analogous to that of Corollary [2.7| . □ 

Theorem pT3| , and its subsequent corollaries, can be used to bound the distance to station- 
arity of many different Markov chains W on (G I S n ) / (S r x S n - r ) for which bounds on the 
L 2 distance to uniformity for the related Markov chains on Si+j/(Si x Sj) for < i < r and 
< j < n — r are known. As an example, the following result establishes an upper bound on 
both the total variation distance and ||P fc ((xq, e), ■) — P°°(-)||2 in the special case when Q is 
defined by (|1.4j ). This corollary actually fits the framework of Corollary |3]7], but the result is 
better than that which would have been determined by merely applying Corollary |3.?1 . When 
G = 1*2 arid P is the uniform distribution on G, the result reduces to Theorem [L 



Corollary 3.8. Let W be the Markov chain on the homogeneous space 
(G I S n ) j (S r x S n - r ) as in Theorem |3.3j , where Q is the probability measure on S n 

defined at (|1.4j). Let k = |n (\ogn + log 1 j + cj . Then there exists a universal 

constant b > such that 

||P fc ((f ;e),-)-P oo (-)llTV < | ||P fc ((x ;e),-)-P OO (-)ll2 < be~ c ' 2 for all c> 0. 
Proof. The proof is analogous to that of Corollary [2.8| . □ 

Corollary |3i| shows that k = \n ^logn + log 1 j + cj steps are sufficient for the 

L 2 distance, and hence also the total variation distance, to become small. A lower bound 

in the L 2 distance can also be derived by examining In lj (l — ^) 4A \ which is the 

contribution, when % + j = n — 1 and m = 2, to the second summation of (|3.6|) from 
the proof of Corollary |3.5| . In the present context, the second summation of ( |3.6|) is the 
second summation in the statement of Theorem |3.3| with fl n (J) = (| J\/n) 2 . Notice that 



k = \n ^logn + log 1 j — cj steps are necessary for just this term to become small. 
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